31 research outputs found

    On the analysis of the timing behaviour of time randomised caches

    Get PDF
    Time Randomised caches (TRc), which can be implemented at hardware level or with software means on conventional deterministic cache designs, have been proposed for real-time systems as key enablers for Probabilistic Timing Analysis (PTA) and in particular its measurement-based variant: Measurement-Based Probabilistic Timing Analysis (MBPTA). A key parameter of MBPTA is the number of runs required to ensure representativity of the execution time measurements taken at analysis time with respect to execution times that can occur during system deployment, so that MBPTA can trustworthily be applied. In this thesis, we propose several methods to determine whether the number of observations taken at analysis, as part of the normal MBPTA application process, capture the cache events significantly impacting execution time and Worst-Case Execution Time (WCET). If this is not the case, our techniques provide the user with the number of extra runs required so that cache events are captured ensuring trustworthiness on MBPTA provided WCET estimates. Our techniques have been evaluated using a set of synthetic benchmarks and a real avionics application

    Smart hardware designs for probabilistically-analyzable processor architectures

    Get PDF
    Future Critical Real-Time Embedded Systems (CRTES), like those is planes, cars or trains, require more and more guaranteed performance in order to satisfy the increasing performance demands of advanced complex software features. While increased performance can be achieved by deploying processor techniques currently used in High-Performance Computing (HPC) and mainstream domains, their use challenges the software timing analysis, a necessary step in CRTES' verification and validation. Cache memories are known to have high impact in performance, and in fact, current CRTES include multicores usually with several levels of cache. In this line, this Thesis aims at increasing the guaranteed performance of CRTES by using techniques for caches building upon time randomization and providing probabilistic guarantees of tasks' execution time. In this Thesis, we first focus on on improving cache placement and replacement to improve guaranteed performance. For placement, different existing policies are explored in a multi-level cache setup, and a solution is reached in which different of those policies are combined. For cache replacement, we analyze a pathological scenario that no cache policy so far accounts and propose several policies that fix this pathological scenario. For shared caches in multicore we observe that contention is mainly caused by private writes that go through to the shared cache, yet using a pure write-back policy also has its drawbacks. We propose a hybrid approach to mitigate this contention. Building on this solution, the next contribution tackles a problem caused by the need of some reliability mechanisms in CRTES. Implementing reliability close to the processor's core has a significant impact in performance. A look-ahead error detection solution is proposed to greatly mitigate the performance impact. The next contribution proposes the first hardware prefetcher for CRTES with arbitrary cache hierarchies. Given its speculative nature, prefetchers that have a guaranteed positive impact on performance are difficult to design. We present a framework that provides execution time guarantees and obtains a performance benefit. Finally, we focus on the impact of timing anomalies in CRTES with caches. For the first time, a definition and taxonomy of timing anomalies is given for Measurement-Based Timing Analysis. Then, we focus on a specific timing anomaly that can happen with caches and provide a solution to account for it in the execution time estimates.Los Sistemas Empotrados de Tiempo-Real Crítico (SETRC), como los de los aviones, coches o trenes, requieren más y más rendimiento garantizado para satisfacer la demanda al alza de rendimiento para funciones complejas y avanzadas de software. Aunque el incremento en rendimiento puede ser adquirido utilizando técnicas de arquitectura de procesadores actualmente utilizadas en la Computación de Altas Prestaciones (CAP) i en los dominios convencionales, este uso presenta retos para el análisis del tiempo de software, un paso necesario en la verificación y validación de SETRC. Las memorias caches son conocidas por su gran impacto en rendimiento y, de hecho, los actuales SETRC incluyen multicores normalmente con diversos niveles de cache. En esta línea, esta Tesis tiene como objetivo mejorar el rendimiento garantizado de los SETRC utilizando técnicas para caches y utilizando métodos como la randomización del tiempo y proveyendo garantías probabilísticas de tiempo de ejecución de las tareas. En esta Tesis, primero nos centramos en mejorar la colocación y el reemplazo de caches para mejorar el rendimiento garantizado. Para la colocación, diferentes políticas son exploradas en un sistema cache multi-nivel, y se llega a una solución donde diversas de estas políticas son combinadas. Para el reemplazo, analizamos un escenario patológico que ninguna política actual tiene en cuenta, y proponemos varias políticas que solucionan este escenario patológico. Para caches compartidas en multicores, observamos que la contención es causada principalmente por escrituras privadas que van a través de la cache compartida, pero usar una política de escritura retardada pura también tiene sus consecuencias. Proponemos un enfoque híbrido para mitigar la contención. Sobre esta solución, la siguiente contribución ataca un problema causado por la necesidad de mecanismos de fiabilidad en SETRC. Implementar fiabilidad cerca del núcleo del procesador tiene un impacto significativo en rendimiento. Una solución basada en anticipación se propone para mitigar el impacto en rendimiento. La siguiente contribución propone el primer prefetcher hardware para SETRC con una jerarquía de caches arbitraria. Por primera vez, se da una definición y taxonomía de anomalías temporales para Análisis Temporal Basado en Medidas. Después, nos centramos en una anomalía temporal concreta que puede pasar con caches y ofrecemos una solución que la tiene en cuenta en las estimaciones del tiempo de ejecución.Postprint (published version

    Behavior characterization of the shared last-level cache in a chip multiprocessor

    Get PDF
    [CATALÀ] Aquest projecte consisteix a analitzar diferents aspectes de la jerarquia de memòria i entendre la seva influència al rendiment del sistema. Els aspectes que s'analitzaran són els algorismes de reemplaçament, els esquemes de mapeig de memòria i les polítiques de pàgina de memòria.[ANGLÈS] This project consists in analyzing different aspects of the memory hierarchy and understanding its influence in the overall system performance. The aspects that will be analyzed are cache replacement algorithms, memory mapping schemes and memory page policies

    A confidence assessment of WCET estimates for software time randomized caches

    Get PDF
    Obtaining Worst-Case Execution Time (WCET) estimates is a required step in real-time embedded systems during software verification. Measurement-Based Probabilistic Timing Analysis (MBPTA) aims at obtaining WCET estimates for industrial-size software running upon hardware platforms comprising high-performance features. MBPTA relies on the randomization of timing behavior (functional behavior is left unchanged) of hard-to-predict events like the location of objects in memory — and hence their associated cache behavior — that significantly impact software's WCET estimates. Software time-randomized caches (sTRc) have been recently proposed to enable MBPTA on top of Commercial off-the-shelf (COTS) caches (e.g. modulo placement). However, some random events may challenge MBPTA reliability on top of sTRc. In this paper, for sTRc and programs with homogeneously accessed addresses, we determine whether the number of observations taken at analysis, as part of the normal MBPTA application process, captures the cache events significantly impacting execution time and WCET. If this is not the case, our techniques provide the user with the number of extra runs to perform to guarantee that cache events are captured for a reliable application of MBPTA. Our techniques are evaluated with synthetic benchmarks and an avionics application.The research leading to these results has received funding from the European Community’s Seventh Framework Programme [FP7/2007-2013] under the PROXIMA Project (www.proxima-project.eu), grant agreement no 611085. This work has also been partially supported by the Spanish Ministry of Science and Innovation under grant TIN2015-65316, the HiPEAC Network of Excellence, and COST Action IC1202: Timing Analysis On Code-Level (TACLe). Jaume Abella has been partially supported by the Ministry of Economy and Competitiveness under Ramon y Cajal postdoctoral fellowship number RYC-2013-14717.Peer ReviewedPostprint (author's final draft

    Performance analysis and optimization of automotive GPUs

    Get PDF
    © 2019 IEEE. Personal use of this material is permitted. Permission from IEEE must be obtained for all other uses, in any current or future media, including reprinting/republishing this material for advertising or promotional purposes,creating new collective works, for resale or redistribution to servers or lists, or reuse of any copyrighted component of this work in other works.Advanced Driver Assistance Systems (ADAS) and Autonomous Driving (AD) have drastically increased the performance demands of automotive systems. Suitable highperformance platforms building upon Graphic Processing Units (GPUs) have been developed to respond to this demand, being NVIDIA Jetson TX2 a relevant representative. However, whether high-performance GPU configurations are appropriate for automotive setups remains as an open question. This paper aims at providing light on this question by modelling an automotive GPU (Jetson TX2), analyzing its microarchitectural parameters against relevant benchmarks, and identifying specific configurations able to meaningfully increase performance within similar cost envelopes, or to decrease costs preserving original performance levels. Overall, our analysis opens the door to the optimization of automotive GPUs for further system efficiency.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the European Research Council (ERC) under the European Union’s Horizon 2020 research and innovation programme (grant agreement No. 772773) and the HiPEAC Network of Excellence. Pedro Benedicte and Jaume Abella have been partially supported by the MINECO under FPU15/01394 grant and Ramon y Cajal postdoctoral fellowship number RYC-2013-14717 respectively and Leonidas Kosmidis under Juan de la Cierva-Formacin postdoctoral fellowship (FJCI-2017-34095).Peer ReviewedPostprint (author's final draft

    Locality-aware cache random replacement policies

    Get PDF
    Measurement-Based Probabilistic Timing Analysis (MBPTA) facilitates the analysis of complex software running on hardware comprising high-performance features. MBPTA also aims at preventing additional analysis costs for timing analysis techniques and preserving the confidence on derived WCET estimates. Cache behavior has a deep influence on WCET estimates and hence on “the amount of software” that can be consolidated onto a single hardware platform. Deterministic replacement policies such as LRU (Least Recently Used) and NMRU (Non-Most Recently Used) have systematic pathological cases that may lead to high execution times and WCET estimates. Instead, random replacement (RR) decreases pathological cases probability, at the cost of temporal locality. We present two new MBPTA-amenable replacement policies that completely remove the presented pathological cases. The first policy, Random Permutations (RP) preserves higher temporal locality than RR; while the second, NMRU Random Permutations (NMRURP), also protects the Most Recently Used line from eviction. Both proposed policies build upon restricted random replacement choices. Our simulation evaluation (validated against a real prototype) using the Mälardalen benchmarks and a case study shows that RP and NMRURP deliver both high average performance (within 1% of LRUs and NRMU performance) and tight WCET estimates 11% and 24% lower than those of RR.This work has been partially supported by the Spanish Ministry of Economy and Competitiveness (MINECO) under grant TIN2015-65316-P, the European Research Council (ERC) under the European Union's Horizon 2020 research and innovation programme (grant agreement No. 772773) and the HiPEACH Network of Excellence. Pedro Benedicte and Jaume Abella have been partially supported by the MINECO under FPU15/01394 grant and Ramon y Cajal postdoctoral fellowship number RYC- 2019-14717 respectively.Peer ReviewedPostprint (author's final draft

    Improving time-randomized cache design

    Get PDF
    Enabling timing analysis for caches has been pursued by the critical real-time embedded systems (CRTES) community for years due to their potential to reduce worstcase execution times (WCET). Measurement-based protabilistic timing analysis (MBPTA) techniques have emerged as a solution to time-analyze complex hardware including caches, as long as they implement some random policies. Existing random placement and replacement policies have been proven efficient to some extent for single-level caches. However, they may lead to some probabilistic pathological eviction scenarios. In this work we propose new random placement and replacement policies specifically tailored for multi-level caches and for avoiding any type of pathological case

    SafeDE: A low-cost hardware solution to enforce diverse redundancy in multicores

    Get PDF
    Failure risk must be tiny in high-integrity systems, such as those in cars, satellites and aircraft. Hence, safety measures must be deployed to avoid a single fault leading to a failure. Redundancy has been often used to address this concern, but it has been proven insufficient if a single fault can cause the same error in all redundant elements, which defeats the purpose of redundancy for error detection. Hence, to avoid this scenario, diversity is implemented along with redundancy, being lockstep execution the most popular diverse redundancy solution for computing cores. However, classic lockstep solutions have non-negligible limitations if implemented in hardware (e.g., half of the cores can only be used for redundant execution and are not even visible at user level), or in software (e.g., the software loop to enforce staggering is long and costs performance). This paper tackles the limitations of classic lockstep solutions by providing an extended analysis and evaluation of SafeDE, a Diversity Enforcement hardware module combining the short loop to enforce diversity of hardware solutions, and the nonintrusiveness of software solutions. Hence, cores can operate in lockstep mode efficiently or run independent tasks. In this paper, we present SafeDE and its rationale, its application to N-modular systems, its hardware and software integration, and an evaluation showing its performance and area efficiency, and its behavior in the presence of faults.This work was supported in part by the European Union’s Horizon 2020 Research and Innovation Programme under Grant 871467, and in part by the Spanish Ministry of Science and Innovation under Grant PID2019-107255GB-C21/AEI/10.13039/501100011033.Peer ReviewedPostprint (author's final draft

    Towards limiting the impact of timing anomalies in complex real-time processors

    Get PDF
    Timing verification of embedded critical real-time systems is hindered by complex designs. Timing anomalies, deeply analyzed in static timing analysis, require specific solutions to bound their impact. For the first time, we study the concept and impact of timing anomalies in measurement-based timing analysis, the most used in industry, showing that they require to be considered and handled differently. In addition, we analyze anomalies in the context of Measurement-Based Probabilistic Timing Analysis, which simplifies quantifying their impact.Peer ReviewedPostprint (published version

    SafeDM: a hardware diversity monitor for redundant execution on non-lockstepped cores

    Get PDF
    Computing systems in the safety domain, such as those in avionics or space, require specific safety measures related to the criticality of the deployment. A problem these systems face is that of transient failures in hardware. A solution commonly used to tackle potential failures is to introduce redundancy in these systems, for example 2 cores that execute the same program at the same time. However, redundancy does not solve all potential failures, such as Common Cause Failures (CCF), where a single fault affects both cores identically (e.g. a voltage droop). If both redundant cores have identical state when the fault occurs, then there may be a CCF since the fault can affect both cores in the same way. To avoid CCF it is critical to know that there is diversity in the execution amongst the redundant cores. In this paper we introduce SafeDM, a hardware Diversity Monitor that quantifies the diversity of each redundant processor to guarantee that CCF will not go unnoticed, and without needing to deploy lockstepped cores. SafeDM computes data and instruction diversity separately, using different techniques appropriate for each case. We integrate SafeDM in a RISC-V FPGA space MPSoC from Cobham Gaisler where SafeDM is proven effective with a large benchmark suite, incurring low area and power overheads. Overall, SafeDM is an effective hardware solution to quantify diversity in cores performing redundant execution.EU’s Horizon 2020 grant no. 871467 and Spanish MSI grant PID2019-107255GB-C21/AEI/10.13039/501100011033.Peer ReviewedPostprint (author's final draft
    corecore